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Executive Summary 



This report presents findings from the Enhanced Reading Opportunities (ERO) study — 
a demonstration and rigorous evaluation of two supplemental literacy programs that aim to im- 
prove the reading comprehension skills and school performance of struggling ninth-grade read- 
ers. The U.S. Department of Education’s (ED) Office of Elementary and Secondary Education 
(OESE ) 1 is funding the implementation of these programs, and its Institute of Education 
Sciences (IES) is responsible for oversight of the evaluation. MDRC — a nonprofit, nonpartisan 
education and social policy research organization — is conducting the evaluation in partnership 
with the American Institutes for Research (AIR) and Survey Research Management (SRM). 

The present report — the second of three — focuses on the second of two cohorts of 
ninth-grade students to participate in the study and discusses the impact that the two interven- 
tions had on these students’ reading comprehension skills through the end of their ninth-grade 
year. The report also describes the implementation of the programs during the second year of 
the study and provides an assessment of the overall fidelity with which the participating schools 
adhered to the program design as specified by the developers. While this report focuses primari- 
ly on implementation and impacts in the second year of the study, comparisons between the first 
and second year of the study are also provided . 2 The key findings discussed in the report include 
the following: 

• On average, across the 34 participating high schools, the supplemental 
literacy programs improved student reading comprehension test scores 
by 0.08 standard deviation. This represents a statistically significant im- 
provement in students’ reading comprehension (p-value = 0.042). 

• Seventy-seven percent of the students who enrolled in the ERO classes in 
the second year of the study were still reading at two or more years be- 
low grade level at the end of ninth grade, relative to the expected read- 
ing achievement of a nationally representative sample of ninth-grade 
students. 3 One of the two interventions — Reading Apprenticeship Aca- 



'The implementation was initially funded by the Office of Vocational and Adult Education (OVAE), but 
this role was later transferred to OESE. 

2 James J. Kemple, William Corrin, Elizabeth Nelson, Teny Salinger, Suzannah Herrmann, and Kathryn 
Drummond, The Enhanced Reading Opportunities Study: Early Impacts and Implementation Findings, NCEE 
2008-4015 (Washington, DC:, U.S. Department of Education, Institute of Education Sciences, National Center 
for Education Evaluation and Regional Assistance, 2008). 

3 Forty percent of ninth-graders nationally would be expected to score at two or more years below grade 
level on the same assessment. 




demic Literacy (RAAL) — had a positive and statistically significant 
impact on reading comprehension test scores (0.14 standard deviation; 
p-value = 0.015). Although not statistically significant, a positive impact 
on reading comprehension (0.02 standard deviation) was also produced 
by the other intervention, Xtreme Reading. The difference in impacts 
between the two programs is not statistically significant, and thus it can- 
not be concluded that RAAL had a different effect on reading compre- 
hension than Xtreme Reading. 4 

• The overall impact of the ERO programs on reading comprehension test 
scores in the second year of implementation (0.08 standard deviation) is 
not statistically different from their impact in the first year of implemen- 
tation (0.09 standard deviation), nor is each intervention’s impact in the 
second year of implementation statistically different from its impact in 
the first year. 

• The implementation fidelity of the ERO programs was more highly 
rated in the second year of the study than in the first year. In compari- 
son with the first year, a greater number of schools in the second year of 
the study were deemed to have programs that were well aligned with the 
program developers’ specifications for implementation fidelity (26 
schools in the second year, compared with 16 schools in the first year), 
and fewer schools were considered to be poorly aligned (one school in 
the second year, compared with 10 schools in the first year). 



4 It is important to note that the ERO study is an evaluation of a class of reading interventions, as 
represented by Xtreme Reading and RAAL, as well as an evaluation of each of these two programs separately. 
The purpose of the study is not to test the differential impact of these two interventions; while Xtreme Reading 
and RAAL do differ in some respects, they are both full-year supplemental literacy courses targeted at strug- 
gling adolescent readers that share many common principles, and hence there was no prior expectation that 
they would produce substantially different impacts. As noted below, the design of the study is such that pro- 
grams are randomized to schools; however, the purpose of this randomization was to ensure that each program 
developer was assigned a fair draw of schools in which to implement its program, rather than to test for a diffe- 
rential impact between the two interventions. By this token, the statistical model chosen for the impact analysis 
does not utilize the school-level randomization feature of the research design; nor is the sample size large 
enough to detect policy-relevant differences in impacts across the two programs. Because Xtreme Reading and 
RAAL represent the same type of intervention, this study was designed to test their joint or overall impact. 
Statistical tests were used to confirm that the difference in impacts between the two programs is not statistical- 
ly significant and, hence, that it is indeed appropriate to pool together the two program-specific impact esti- 
mates; these statistical tests are not appropriate for making inferences about the tine difference in impacts be- 
tween the two interventions. 
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The Supplemental Literacy Interventions 

The ERO study is a test of supplemental literacy interventions that are designed as full- 
year courses and targeted to students whose reading skills are two or more years below grade 
level as they enter high school. Two programs — Reading Apprenticeship Academic Literacy 
(RAAL), designed by WestEd, and Xtreme Reading, designed by the University of Kansas 
Center for Research on Learning — were selected for the study from a pool of 17 applicants by 
a national panel of experts on adolescent literacy. To qualify for the project, the programs were 
required to focus instruction in the following areas: (1) student motivation and engagement; (2) 
reading fluency, or the ability to read quickly, accurately, and with appropriate expression; (3) 
vocabulary, or word knowledge; (4) comprehension, or making meaning from text; (5) phonics 
and phonemic awareness (for students who could still benefit from instruction in these areas); 
and (6) writing. The overarching goals of both programs are to help ninth-grade students adopt 
the strategies and routines used by proficient readers, improve their comprehension skills, and 
be motivated to read more and to enjoy reading. Both programs are supplemental in that they 
consist of a yearlong course that replaces a ninth-grade elective class, rather than a core academ- 
ic class, and in that they are offered in addition to students’ regular English language arts 
classes. 



The primary differences between the two literacy interventions selected for the ERO 
study lie in their approach to implementation. Implementation of RAAL is guided by the con- 
cept of “flexible fidelity” — that is, while the program includes a detailed curriculum, the 
teachers are trained to adapt their lessons to meet the needs of their students and to supplement 
program materials with readings that are motivating to their classes. Teachers have flexibility in 
how they include various aspects of the RAAL curriculum in their day-to-day teaching activi- 
ties, but they have been trained to do so such that they maintain the overarching spirit, themes, 
and goals of the program in their instruction. 

Implementation of Xtreme Reading is guided by the philosophy that the presentation 
of instructional material — particularly the order and timing with which the lessons are pre- 
sented — is of critical import to students’ understanding of the strategies and skills being 
taught. As such, teachers are trained to deliver course content and materials in a precise, orga- 
nized, and systematic fashion designed by the developers. Xtreme Reading teachers follow a 
prescribed implementation plan, following specific day-by-day lesson plans in which activities 
have allotted segments of time within each class period. Teachers also use responsive instruc- 
tional practices to adapt and adjust to student needs that arise as they move through the highly 
structured curriculum. 
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Overview of the Study 



Interventions. Reading Apprenticeship Academic Literacy (RAAL) and Xtreme Reading — 
supplemental literacy programs designed as full-year courses to replace a ninth-grade elective 
class. The programs were selected through a competitive applications process based on ratings by 
an expert panel. 

Study sample. Two cohorts of ninth-grade students from 34 high schools and 10 school districts 
(2,916 students in Cohort 1 and 2,679 students in Cohort 2). Districts and schools were selected 
by ED’s Office of Vocational and Adult Education through a special Small Learning Communi- 
ties grant competition. Students were selected based on reading comprehension test scores that 
were between two and five years below grade level. 

Research design. Within each district, high schools were randomly assigned to use either the 
RAAL program or the Xtreme Reading program during two school years (2005-2006 and 2006- 
2007). Within each high school, students were randomly assigned to enroll in the ERO class or to 
remain in a regularly scheduled elective class. A reading comprehension test and a survey were 
administered to students in the spring of eighth grade or at the start of ninth grade, prior to random 
assignment, and again at the end of ninth grade. Classroom observations in the first and second 
semester of the school year were used to measure implementation fidelity. 

Outcomes. Reading comprehension and vocabulary test scores, reading behaviors, student atten- 
dance in the ERO classes and other literacy support services, implementation fidelity. 



The ERO Evaluation 

The supplemental literacy programs were implemented in 34 high schools from 10 
school districts across the country. The districts were selected through a special grant competi- 
tion organized by the U.S. Department of Education’s Office of Vocational and Adult Educa- 
tion (OVAE). Experienced, full-time English/language arts or social studies teachers were self- 
selected and approved by ED, the districts, and the schools to teach the programs for a period of 
two years. 

The ERO evaluation utilizes a two-level random assignment research design. First, 
within each district, eligible high schools were randomly assigned prior to the first year of 
program implementation to use one of the two supplemental literacy programs: 17 of the high 
schools were assigned to use RAAL, and 17 schools were selected to use Xtreme Reading. 
Each school implemented the same program in two school years: 2005-2006 and 2006-2007. 
In the second stage of the study design, eligible students within each of the participating high 
schools and in each year of the study were randomly assigned either to enroll in the ERO class 
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(the “ERO group”) or to take one of their school’s regularly offered elective classes (the “non- 
ERO group”). 



During the second year of the study, the participating high schools identified 2,679 
ninth-grade students with baseline test scores indicating that they were reading two to five 
years below grade level (an average of 79 students per school). Approximately 57 percent of 
these students were randomly assigned to enroll in the ERO class, and the remaining students 
make up the study’s control group and were enrolled in or continued in a regularly scheduled 
elective class. 

Evaluation data were collected with the Group Reading Assessment and Diagnostic 
Examination (GRADE) reading comprehension and vocabulary tests and a survey. 5 Both in- 
struments were administered to students at two points in time: a baseline assessment and survey 
in the spring of eighth grade and a follow-up assessment and survey at the end of ninth grade. 6 
Follow-up test scores are available for 2,171 (81 percent) of the students in the study sample. 
To leam about the fidelity of program implementation, the study also includes observations of 
the supplemental literacy classes during the first and second semester of the school year. 



Second-Year Implementation 

Each ERO teacher (one per school) was responsible for teaching four sections of the 
ERO class. Each section accommodated between 10 and 15 students. Classes were designed to 
meet for a minimum of 225 minutes per week and were scheduled as a 45-minute class every 
day or as a 75- to 90-minute class that met every other day. 

• Of the 34 teachers who participated in the second year of the study, 25 
had taught the entire first year of the study, and two had taught a por- 
tion of the first year (having replaced a teacher midyear). Seven teachers 
were new to the ERO programs at the start of the second year. 

During the second year of the project, the developers for each of the ERO programs 
provided three types of training and technical assistance to both new and returning ERO teach- 
ers: a three-day summer training institute in July or August 2006, booster training sessions dur- 
ing the 2006-2007 school year, and three 2-day coaching visits during the 2006-2007 school 
year. Prior to the summer institute, teachers new to the ERO programs also attended additional 



5 American Guidance Service, Group Reading Assessment and Diagnostic Evaluation: Teacher ’s Scoring 
and Interpretive Manual, Level H; and Technical Manual (Circle Pines, MN: American Guidance Service, 
2001a, 2001b). 

6 In four of the 34 participating schools, baseline testing occurred in the fall of ninth grade rather than the 
spring of eighth grade. 
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training sessions at which they were taught the central strategies of the program being imple- 
mented in their school. 

The study team assessed the overall fidelity with which the ERO programs were im- 
plemented in each school during the second year of the project. In the context of this study, “fi- 
delity” refers to the degree to which the observed operation of the ERO program in a given high 
school was aligned with the intended learning environment and instructional practices that were 
specified by the model’s developers. The analysis of implementation fidelity in the second year 
of the study is based on two field research visits to each of the 34 high schools — one during 
the first semester and one during the second semester of the 2006-2007 school year. The class- 
room observation protocols used in the site visits provided a structured process for observers to 
rate the characteristics of the ERO classroom learning environments and the use of ERO in- 
structional strategies by teachers. The instrument included ratings for six characteristics (re- 
ferred to as “constructs” from here forward) that are common to both programs, as well as rat- 
ings for seven program-specific constructs. For each construct, a category rating of 1 (“poorly 
aligned”), 2 (“moderately aligned”), or 3 (“well aligned”) was given. 

The analysis of the classroom observation ratings sought to capture implementation fi- 
delity on two key overarching dimensions of both programs: the classroom learning environ- 
ment and the teacher’s use of instructional strategies focused on reading comprehension. A 
composite measure of implementation fidelity was calculated for each of these two dimensions 
by averaging across the relevant characteristics in the observation protocol. A composite rating 
of 2.0 or higher indicates that the school’s ERO program was well aligned with the developers’ 
implementation specifications; a rating of 1.5 to 1.9 means that the program was moderately 
aligned; and a rating of 1.0 to 1.4 means that it was poorly aligned. Following is a summary of 
key findings. 

• At the spring site visit, implementation fidelity in 26 of the 34 schools was 
classified as well aligned on both program dimensions. In seven schools, 
implementation was classified as moderately aligned with the program 
model on at least one of the two key program dimensions and as mod- 
erately or well aligned on the other dimension. In one school, implemen- 
tation was deemed to be poorly aligned with the program models. 

The overall implementation of the ERO program in a given school was classified as 
well aligned if both the classroom environment and the comprehension instruction dimension 
were rated as being well aligned. According to the protocols used for the classroom observa- 
tions, teacher behaviors and classroom activities in these schools were consistently rated as be- 
ing well developed and reflective of the behaviors and activities specified by the developers. At 
the fall site visit, the implementation of the ERO programs in 20 of the 34 schools was classi- 




fied as well aligned on both program dimensions, and, at the spring site visit, 26 schools had 
attained this benchmark. Because implementation fidelity in the majority of the study schools 
was deemed to be well aligned to the models, the study team also examined the number of 
schools whose implementation of the programs was “very well aligned” to developers’ specifi- 
cations (defined here as a composite score of 2.5 or higher on both program dimensions). At the 
spring site visit, implementation in 13 schools could be classified as such. 

Conversely, a school’s overall implementation fidelity was judged to be poorly aligned 
with the program model if the composite rating for either the classroom learning environment 
dimension or the comprehension instruction dimension was rated as poorly aligned. The ERO 
programs in these schools were not representative of the activities and practices intended by the 
respective program developers and were found to have encountered serious implementation 
problems on at least one of the two key program dimensions during the second year of the 
study. 7 At the fall site visit, implementation of the ERO programs in three of the 34 schools was 
classified as poorly aligned with the program models on at least one of the two program dimen- 
sions. At the spring site visit, implementation at one school was considered to be poorly aligned 
with the program models. 8 

• The number of schools considered to be well aligned with the program 
developers’ specifications for implementation fidelity was greater in the 
second year of the study than in the first year (26 schools in the second 
year, compared with 16 schools in the first year). 

At the spring site visit in the second year of the study, the ERO programs in 33 of the 
34 schools reached an overall level of implementation fidelity that was at least moderately 
aligned to the program models (of these, 26 were considered to be well aligned). This is an im- 
provement over the first year of the study, when 24 of the 34 schools had reached a moderate 
level of alignment at the spring site visit (of these, 16 schools were deemed to be well aligned). 
Also, during the spring site visit of the second year, only one school’s implementation of the 
program was poorly aligned to the developers’ specifications. This is lower than what was 
found during the first-year spring site visit, when 10 schools were ranked as poorly aligned on at 
least one of the two key program dimensions. 



7 ln particular, poorly aligned implementation for a given dimension means that the classroom observers 
found that at least half of the classroom characteristics were not aligned with the behaviors and activities speci- 
fied by the developers and described in the protocols. 

s In the second year of the study, implementation-fidelity ratings were similar for the 25 schools where the 
ERO teacher taught two full years of the program and for the nine schools where the ERO teacher had replaced 
another teacher at some point during the study (an average rating of 2.5 for returning teachers and 2.4 for re- 
placement teachers, out of a maximum of score 3). 
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Student Enrollment and Attendance in the ERO Classes and 
Participation in Literacy Support Activities 

The study team collected data on the duration of the ERO classes as well as the fre- 
quency with which students attended the ERO classes and participated in other classes or tutor- 
ing services that aimed to improve their reading and writing skills. 

ERO classes in the second year began an average of 2.3 weeks after the start of the 
school year and operated for an average of nine months. Eighteen schools started the ERO pro- 
gram on the first day of school, and five more schools started within the first two weeks that 
classes were in session. The remaining eleven started their ERO programs an average of seven 
weeks after the start of the school year. Among the students randomly assigned to the ERO 
group, 91 percent enrolled in the ERO classes, and 87 percent were still attending the classes at 
the end of the school year. 

• Students in the ERO group attended 79 percent of the scheduled ERO 
classes, and they received an average of 11 hours of ERO instruction per 
month. 

• Students who were randomly assigned to the study’s ERO group re- 
ported a higher frequency of participation in supplemental literacy ser- 
vices than students who were assigned to the non-ERO group. 

The ERO classes served as the primary source of literacy support services for students 
in the study sample. Although the largest difference in the use of supplemental literacy supports 
between the study’s ERO and non-ERO groups occurred in students’ participation in a supple- 
mentary school-based literacy class (an average of 75 yearly sessions for ERO students and 17 
yearly sessions for non-ERO students), ERO students were also significantly more likely to re- 
port working with a tutor in school (an average of 30 yearly sessions, compared with 12 yearly 
sessions for non-ERO students). 



Impact Findings 

The GRADE assessment was used to measure students’ reading achievement prior to 
random assignment (at “baseline”) and then again in the spring at the end of their ninth-grade 
year (at “follow-up”). The GRADE is a norm-referenced, research-based reading assessment 
that is used widely to measure perfonnance and track the growth of an individual student and 
groups of students. Because the two ERO programs focus primarily on helping students use 
contextual clues to understand the meaning of words, the reading comprehension subtest of the 
GRADE is the primary measure of reading achievement in this study, while the GRADE voca- 
bulary subtest is a secondary indicator of the programs’ effectiveness. Performance levels and 



XX 




impacts on both subtests are presented in standard score units; students with a standard score of 
100 points are considered to be reading at grade level. 9 

Following is a summary of the study’s impact findings. 

• When analyzed jointly, the ERO programs produced an increase of 0.8 
standard score point on the GRADE reading comprehension subtests. 

This corresponds to an effect size of 0.08 standard deviation and is sta- 
tistically significant. The overall impact of the programs in the second 
year of implementation is not statistically different from their overall 
impact in the first year of implementation (0.09 standard deviation). 

The top panel of Table ES.l shows the impacts on spring follow-up reading compre- 
hension and vocabulary test scores across all 34 participating high schools in the second year of 
the study. The first row of data in the table shows that, on average, the reading comprehension 
test scores of students in the ERO group are 0.8 standard score point higher than the scores of 
students in the non-ERO group, which represents a statistically significant impact (its p-value is 
less than or equal to 5 percent). 10 Expressed as a proportion of the overall variability of test 
scores for students in the non-ERO group, this estimated impact represents an effect size of 0.08 
(or 8 percent of the standard deviation of the non-ERO group’s test scores). 

Figure ES.l places this impact estimate in the context of the actual and expected change 
in the ERO students’ reading comprehension test scores on the GRADE from the beginning of 
ninth grade to the end of ninth grade. The bottom section of the bar shows that students in the 
ERO group achieved an average standard score of 84.6 at the start of their ninth-grade year. 
This corresponds, approximately, to a grade equivalent of 4.9 (the last month of fourth grade) 
and indicates an average reading level at the 14th percentile for ninth-grade students nationally. 

The middle section of the bar shows the estimated growth in test scores experienced by 
the non-ERO group. At the end of the ninth-grade year, the non-ERO group was estimated to 
have achieved an average standard score of 89.3, which corresponds to a grade equivalent of 6.0 
and an average reading level at the 23rd percentile for ninth-grade students nationally. This 

9 Based on the national norms used to calculate these scores, a standard score of 100 on the GRADE read- 
ing comprehension or vocabulary test is average for a representative group of students at the end of their ninth- 
grade year. The standard deviation of the standard score for both tests is 15. 

10 The impact estimates in Table ES.l are regression-adjusted using ordinaiy least squares (OLS), control- 
ling for blocking of random assignment by school and for random differences between the ERO and non-ERO 
groups in their baseline reading comprehension test scores and age at random assignment. The values in the 
column labeled “ERO Group” are the observed means for students randomly assigned to the ERO group. The 
“Non-ERO Group” values in the next column are the regression-adjusted means for students randomly as- 
signed to the non-ERO group, using the observed mean covariate values for the ERO group as the basis for the 
adjustment. 
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The Enhanced Reading Opportunities Study 
Table ES.l 



Impacts on Reading Achievement, 
Cohort 2 Follow-Up Respondent Sample 











Estimated 


P-Value for 






Non-ERO 


Estimated 


Impact 


Estimated 


Outcome 


ERO 


Group 


Impact 


Effect Size 


Impact 


All schools 












Reading comprehension 












Average standard score 


90.1 


89.3 


0.8 * 


0.08 * 


0.042 


Corresponding grade equivalent 


6.1 


6.0 








Corresponding percentile 


25 


23 








Reading vocabulary 












Average standard score 


93.5 


93.5 


0.0 


0.00 


0.986 


Corresponding grade equivalent 


7.8 


7.8 








Corresponding percentile 


32 


32 








Sample size 


1,264 


907 








Reading ADDrenticeshin Academic Literacy schools 










Reading comprehension 












Average standard score 


90.2 


88.9 


1.4 * 


0.14 * 


0.015 


Corresponding grade equivalent 


6.1 


5.9 








Corresponding percentile 


25 


23 








Reading vocabulary 












Average standard score 


93.4 


93.8 


-0.4 


-0.04 


0.428 


Corresponding grade equivalent 


7.7 


7.8 








Corresponding percentile 


32 


33 








Sample size 


645 


470 








Xtreme Reading schools 












Reading comprehension 












Average standard score 


90.0 


89.7 


0.2 


0.02 


0.672 


Corresponding grade equivalent 


6.1 


6.0 








Corresponding percentile 


25 


24 








Reading vocabulary 












Average standard score 


93.5 


93.1 


0.4 


0.04 


0.468 


Corresponding grade equivalent 


7.8 


7.7 








Corresponding percentile 


32 


31 








Sample size 


619 


437 
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